TM



Helping man use images to communicate.

## A236<sup>TM</sup> Parallel Video Digital Signal Processor Chip

the System Designer's Parallel DSP Chip (TM)

the first Video DSP Chip

**Data Sheet Summary** 

Rev. February 1, 2001.

 $\mathbf{\widehat{lick}} \text{ if no navigation bar on left.}$ 

la download.

#### **Table of Contents**

- <u>1. Features</u>
- <u>2. Packaging</u>
- <u>3. Applications</u>
- <u>4. Functional Description</u>
- <u>5. Block Diagram</u> with links to additional description

# See the <u>A436 Video DSP Chip</u>, which uses a 4th generation Ax36 core. It is 20x more powerful for imaging than the A236 and is signal pin-compatible.

#### **1. Features**

- Fully user-programmable, stand-alone, Video Digital Signal Processor Chip optimized for real-time image and video processing
- Specifically intended for high volume, low cost, high performance, mass market embedded applications
- Provides simultaneous, continuous, video capture, processing AND display without any dedicated frame buffers

- Simple hardware design is easy to understand and use
- Video interface device drivers, sample board-level schematics, application notes and training videos speed you to market
- Specifically designed for ease of programming using our parallel-enhanced, ANSI-compatible C compiler
- Existing C programs can be compiled to run on the A236 Chip, and performance-sensitive loops can be modified to use our parallel enhancements to C
- Programs are small, execute very efficiently and can be optimized easily, and their execution speeds can be forecast easily
- Suite of software development tools puts you in full control of the program
- Highly integrated system-level building block combines many functions into a single chip and is easy to design into systems
- Glueless interfaces to common video encoder and decoder chips and low cost Synchronous DRAM
- Simple interfaces to low cost digital image sensors, high speed, high resolution, digital image sensors and other data streaming devices
- Device drivers to interface A236 Chip to common video encoder and decoder chips are available
- 32-bit *Structure Processing Instruction Set* provides fast instruction execution and single-instruction parallel operations on C-like parallel data structures
- Four 2x8-/16-bit parallel arithmetic units, one 24-bit scalar arithmetic unit and on-chip Motion Estimation/Pattern Matching Coprocessor for high performance
- Four 16-bit x 16-bit multiply-adds, each with 40-bit accumulation, per CPU Clock
- Supports 16-bit parallel arithmetic, and normal and saturation, signed and unsigned, 8-bit parallel arithmetic
- Native instruction-level support for efficient operations upon monochrome and YUV and SRGB composite color video data
- Superior data movement capabilities including extremely powerful, parallel-byte and -word addressing on arbitrary byte addresses
- Linear 16 MB address space is used for all program and data storage for ease of programming
- On-chip, synchronous, burst pipelined, 64-bit wide, 1 KB Instruction and Data Caches with efficient, 64-byte transfers
- Powerful memory managment programs simply address image data as it is needed, no block moves are required
- 32-bit wide, 100 MHz, 400 MB/S memory port to various configurations of external, low cost, high density Synchronous DRAMs
- Memory port is *adaptively timed* for 100% utilization of speed of Synchronous DRAMs
- Three 16-bit, double-buffered, bidirectional, packet capable and *video-aware* DMA ports connect directly to common video chips
- DMA ports can access 32-bit address space of host processors and control I/O devices
- General purpose, RS-232 serial port with programmable baud rate
- Serial bus port for control of peripherals
- Asynchronous port design all ports are clocked totally independently of one another and CPU for maximum performance
- Multiple A236 Chips can easily be used together in serial or parallel when even higher performance is required
- Low cost, 0.6 um, 5v (3.3v DRAM port), triple layer metal, standard cell CMOS in 208-pin PQFP package
- Uses Oxford Micro Devices' third generation, Ax36 Video DSP core (see <u>A336</u> and <u>A436</u> Video DSPs for fourth generation Ax36 core)

#### 2. Packaging



Photo of A236 Chip in 208-pin Plastic Quad Flat Pack

### **3. Applications**

| Real-time video/image<br>capture, processing and<br>display | Data acquisition and formatting,<br>lens correction |
|-------------------------------------------------------------|-----------------------------------------------------|
| Multimedia, digital office<br>equipment                     | Biometrics, neural networks and pattern recognition |
| Signal processing and video<br>effects generation           | Communications, encryption, decryption              |
| Programmable video<br>compression and<br>decompression      | Internet video appliances and smart cards           |

#### 4. Functional Description

The *A236 Chip* is a versatile, stand alone, fully user-programmable, general purpose building block for real-time digital image and video signal processing. Its unique combination of ease of programming, parallel processing, three video aware DMA ports, simple and powerful memory management that automatically accesses data when it is needed without block moves, and Synchronous DRAM port provide more flexibility and much better performance, more memory, higher system-level integration for ease of use, and lower cost than other fast DSPs. It has: (a) an enhanced single-instruction multiple-data (SIMD) architecture with four, 2x8- or 16-bit parallel arithmetic units that accumulate products to 40-bits and have a total of 256, 16-bit registers; (b) a 64-bit wide, 1 KB, 2-way set-associative, synchronous, burst-pipelined, data cache with sixteen 64-byte pages; (c) a 64-bit wide, 1 KB, 2-way set-associative,

synchronous, burst-pipelined, instruction cache with sixteen 64-byte pages; (d) a single, 32-bit instruction unit supporting single-instruction operation on C-like parallel data structures; (e) a 24-bit scalar arithmetic unit for program control and computing data and program addresses and loop counts; (f) a Crossbar Switch that passes information among the on-chip arithmetic units and functions as a 64-bit barrel shifter with 8-bit increments; and (g) barrel shifters built into the scalar and parallel arithmetic units.

The *A236 Chip's* general purpose *Structure Processing Instruction Set* is extremely powerful and handles a wide range of high performance applications. It supports single-instruction, parallel operations on C-like parallel data structures. A single instruction can address a set of 4 or 8 parallel operands in memory, fetch the parallel operands, operate upon all of the parallel operands and compute a new memory address. Most instruction words are 32 bits long and execute at the rate of one per CPU clock cycle. When processing four 16-bit operands, the equivalent of 12 instructions on a conventional RISC CPU are typically executed by the A236 for every instruction word as a result of its efficient parallel architecture, providing the equivalent of 480 MIPS with only a 40 MHz CPU clock. When processing eight 8-bit operands, the equivalent of 24 instructions on a conventional RISC CPU are typically executed by the A236 for every instruction word, providing the equivalent of 960 MIPS with only a 40 MHz CPU clock. Even higher performance is obtained during motion estimation/pattern matching (approximately 4,000 MIPS on a conventional RISC CPU). Superior data movement capability is also provided which is extremely useful when manipulating color video data in the YUV and SRGB formats and performing hierarchical or pyramid processing.

A single linear 16 MB memory address space is used, simplifying program development. Data is accessed simply by addressing it without the use of any block moves. The manipulation of quad and octal 8-bit and quad 16-bit parallel variables stored on *unaligned addresses* is supported to maximize memory utilization and performance. Color planes can be stored separately for manipulation then combined for video output, or composite input video data can be split into separate color planes. Signed and unsigned, and normal and saturating arithmetic can be performed on four 8-bit parallel variables simultaneously using 16-bit precision. Normal and saturating arithmetic can be performed on eight 8-bit parallel variables simultaneously using 9-bit precision.

**Several application-specific enhancements are provided.** For motion estimation in video compression, color plane alignment in scanners and pattern matching in fingerprints, *Pixel Distance* computes the sum of the absolute values of the differences between four or eight pairs of pixels from two sets of four or eight 8-bit pixels every CPU clock cycle, with one set of operands coming from memory; the sum is accumulated to 16 bits to handle large blocks. The best match is also tracked. For video overlay operations such as chroma keying, four or eight 8-bit or four 16-bit binary masks can be computed simultaneously, eliminating the need for most jump instructions to merge two images. For convolution and the updating of video frames, the addressing and use of successive 64-bit words on *any address* is supported at the full CPU clock rate to provide a sliding window.

The *A236 Chip* is specifically designed to be programmed in C. Our breakthrough, **Symbolic Parallel Programming Method** and **Parallel Programming Model** are implemented in our **Parallel-Enhanced**, **ANSI-Compatible C Compiler**, enabling the A236 Chip to be programmed quickly and easily using a familiar, scalar processing programming model. *Existing C programs can be compiled to run on the A236 Chip*. Performance-sensitive loops can be rewritten using the parallel enhancements in our C compiler to use the parallel processing capability of the A236 Chip. Using a high level form of operator overloading and built-in data structures including **quad\_long**, **quad\_int**, **quad\_short** and **oct\_short**, simultaneous operations upon multiple data elements can be coded as simply as quad\_B += quad\_A. No cryptic macros, in-line assembly code or function calls are required to utilize the parallel processing in the A236 Chip. Careful code generation and optimization is done to avoid needless loads, stores and no-ops.

The A236 Chip has six ports. Three 16-bit bi-directional, asynchronous, double-buffered, packet capable and video-aware DMA ports are provided for loading data and passing information among multiple A236 Chips. No glue logic is required to connect common video decoder and encoder chips to the A236 Chip. Very simple interfaces to low cost digital image sensors, high speed, high resolution, digital image sensors and other data streaming devices are provided. Absolutely all frame buffering is provided within the DRAM connected to the A236 Chip under control of the A236 Chip; no external frame buffers are required. Polarities of the video control signals are software programmable for maximum flexibility. Any number of pixels per line and numbers of lines per frame are supported in progressive and interlaced modes. Two video inputs and one video output can be supported simultaneously. A 32-bit wide, high performance memory port with 64-byte bursts provides a 400 mega-bytes/S interface to inexpensive, synchronous DRAMs for virtually instantaneous access to up to 16 MB of program, data and I/O buffers, sustaining high performance for live video, large data sets and large programs. No memory bus resizing is done, so the full memory bandwidth is available at all times. The memory interface is *adaptively timed* to compensate for the relatively slow access time of Synchronous DRAMs, providing 100% utilization for maximum performance. A serial bus port is provided for controlling peripherals such as video encoders and decoders, and loading of small programs and/or Basic I/O System Software (BIOS) from serial EEPROM into the synchronous DRAMs via the A236 Chip upon reset. Serial EEPROMs can be loaded by the A236 Chip for ease of modification. In packet mode, which is fully hand-shaked, all of the DMA ports can be used to access a host's 32-bit memory address space for obtaining data or transmitting results, or control I/O devices such as mass-storage units, enabling the A236 Chip to be used as the CPU in low cost, stand-alone applications. A RS-232 port with programmable baud rate can be used for programmed serial I/O and to provide test access to the A236 Chip for in-situ application development.

A basic system nucleus requires only three chips, an A236 Chip, a 32-bit synchronous DRAM and a serial EEPROM, yet provides the ability to *simultaneously and continuously* capture, process and display live video images. No external video capture or display buffers are required. Pixel, line, field and frame sync signals are directly supported by the video-aware, parallel DMA ports for the utmost ease of video interface. All ports are asynchronous of each other for maximum flexibility.

The *A236 Development Environment*, a suite of software development tools for the A236 Chip, runs under Microsoft Windows 95/98. The A236 Chip executes a single task in parallel, so simple, familiar, scalar processing programming techniques can be used, and a simple, single-task operating system can be used for software development. An assembler, parallel-enhanced ANSI-compatible C compiler, linker, loader, simulator and debugger are provided. A reduced functionality version is available from our Web page. The *most extensive* Help capability in the industry is also provided within the tools. A hardware/software evaluation kit for an IBM-compatible PC with a PCI bus and running Microsoft Windows 95/98 is also available.

#### **5. Block Diagram of A236 Parallel Video Digital Signal Processor** Chip



Click on an item in the Block Diagram to see an explanation of it.

## **Oxford Micro Devices, Inc.**

Ax36<sup>TM</sup> Parallel Image and Video Digital Signal Processors DSP Chips

Lantern Ridge Office Park; 731 Main Street, Building 2; Monroe, CT 06468 USA tel. 203-445-0562; fax 203-445-0564; e-mail: <u>info@omdi.com</u>; Web: <u>http://www.omdi.com/</u>

Copyright © 1995 to 2001. All rights reserved. Patents pending. All trademarks are the property of their respective owners and are used for product identification purposes only.